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(54) Concurrent memory control for turbo decoders 



(57) The concurrent memory control turbo decoder 
solution of this invention uses an interleaved forward- 
reverse addressing with a single port memory and a sim- 
plified scratch memory. This invention computes beta 
state metrics for a sliding window and stores them in a 
scratch memory in a first addressing order (1000). For 
each sliding window, this invention computes alpha 
state metrics for the sliding window, reads beta state 



metrics from the scratch memory in the addressing or- 
der of the current repetition (1 005) and combines alpha 
state metrics and beta state metrics in an extrinsic block. 
This invention then computes beta state metrics for a 
next sliding window and stores the data scratch memory 
in the addressing order of the current repetition. The ad- 
dressing order is toggled (1001) for the next repetition 
(1004) until a frame of data is decoded. 
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Description 

TECHNICAL FIELD OF THE INVENTION 

[0001] The technical field of this invention is turbo decoders used in forward error correction. 
BACKGROUND OF THE INVENTION 

f00021 Turbo codes are a type of forward error correction code with powerful capabilities. These codes are becoming 
wSefy usedTn many appHcaSons such as wireless handsets, wireless base stations, hard disk dr^es, wireless LANs, 
s^eX and *gim» television. Turbo codes consist of a concatenation of convolutiona. codes, connected by an ,n- 
terleaver, with an iterative decoding algorithm. . _, . 

r00031 An example of a prior art rate 1/3 parallel-concatenated encoder is shown in Figure 1 
0004 Tnout data stream 1 00 (x ffl ) is supplied unmodified to multiplexer 1 04 at input 1 06. The two Recursive Sys- 

ralcon^tnaS 

^W^o^S Iv^Sc encoderB 102 and 103, the resulting bit streams are supplied to multiplexer 104 at .nputs 
^ranTloSTes^OveVy B,ock 1 01 is an Interleave (I) which randomly re-arranges the information bits to decorre ate 
™£L ZSSZS. BSC encoders 102 and 103 generate respective P 0 m and p1 bit streams. MuMp.exer 104 
reassembles these x„ P 0 m and p1 m bit streams into a resulting output bit stream 1 05 (a* P 0 0 and pi 0 . . ). 
00051 Raure 2 Vus rates a funcUonal block diagram of a prior art turbo decoder 200. Iterative turbo decoder 200 

a pair of maximum-a-posteriori (MAP) blocks 202 and 203. Each iteration requires the 
exeXn ofto M^P decodes to generate two sets of extrinsic information. The first MAP decoder 202 uses the non- 
SI ?£2 as «s input and "he second MAP decoder 203 uses the interleaved data from the mterleaver block 
201 as its input. The MAP decoders 202 and 203 compute the extrinsic informat.on as: 

Pr(x n =1IR") m 

W n = log 11 

n Pr(x n = OIR") 

so where: R^lRo-Rf ...R„) .which are the received symbols. MAP decoders 202 and 203 also compute the a po^ 
probabilities: 

Pr(x n = 1IR/ 1 ) = — 2 Pr(x n =i ' S n =m'. S^-m) (2) 
n Pr(R") 

where: S n is the state at time n in the trellis of the constituent convolutional code. 
[0006] The terms in the summation can be expressed in the form 

40 Pr(X n =i, S n =m\ S^m) = a^m) j n (m ( m')p n (nV) 
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where: the quantity 

is called the branch metric, the quantity 

a n (m') = Pr(S n =m\ FT? ) [5] 



is 



called the forward (or alpha) state metric, and the quantity 



55 Pnfn 1 ) = Pr(R I1+ i| s ^') [6] 



2 



10 



EP1 261 139 A2 

is called the backward (or beta) state metric. 



computed recursively by forward and backward recursions given by * * *** are 



Or. (m*) =s or n _, (m) Yn (nr.,m' ) [7] 
m' , i 



and 
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Pr.-i(m) = s p r (m' )y n 1 (m, m' ) 
m' , i 



[8] 



E Sf 8 "t e n 2 7 COmp,etes me assembling of the output bit stream 208 (^.x, x ,) 

! e°Se and' 1' SSS^ ^X™ " 9 ^.f T ^ ^ -Resent the direction. 

streams 310 ^S^SSSS paramete. X P h*? inPUlS b '° CkS bela and ^ '"P" * 

as parameters X P Z! P ara ™ etere ><n.r. P„. r and A„ >ri respectively. Input bit streams 313 to 315 are labeled 

The^b^ <~ state metric bloc, 30 2 * £££ 

state metric block 303 calculate state metriL R«th ftL f t B ° th the alpha State metric block 302 and bete 

starts the block of „ info™* to 'SEEZS^ ^ a 1*7™ ^ * "** tre " iS ' ZGr ° state ' The encode ' 
the trellis ends a, some Sown P ' " = 51 1 4 ' the f rame si2e > at *• state and after n cycles through 

[0010] Without sliding windows, the frame size of the block would contain n x s x d - 327 29B hit* un»h 
windows, the processing involves rxs x d = 8192 bits whereris12H ri« r iL 11 327,296 bits. With sliding 

reduced through the use of sliding windows * mem0,y S ' ZS re ^ r *™^ are greatly 

FTrVLnilte^^e T^ZtT * "T T^' *** ^ « **» ,h * enc ° der back *> *• «*» state. 

stale melrfcs Block Mil -nT. .ill. f f f M th * s,a,e met** "» generated b» ateha 

H fe gTaSL 6 a ' Pha """" me ' nCS "* n °' S, °' M ««* 305 uses & date aa7oo™ as 

iv ™-. redrrc atete~SM '^tJffiKssf.T Me 

memory size is computed as listed in Table 1 . pp,ymg ine a pnon in P uts 310 t0 31 5. The main 

Table 1 



Main Memory Size 


Number of Bits 


x 0 


5120X8=40,960 


p o 


5120X8=40,960 


p 1 


5120X8=40,960 




5120x8=40,960 


Ai 


5120x8=40,960 


I 


5120x13=66,560 


s x 


176x45X4=31,680 j 


P 2 


2560X8=20,480 


P 3 


2560x8=20,480 
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Main Memory Size 


Number of Bits 


Totals 


344,000 bits 
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SlsT The sliding window block is shown in F.gure ,5 . The M ^ . zero and then exeC ut,r,g 

Lsuallv equal to 4 times to 6 times the constraint 'en^Upon settin 3 a I Ith e s Thjs ^ ^ a s)ze q( f+p 

MOITI Tie beta *"°- b, ° ck mus ' be l" ocessed """^ «.» k sub-blocks. Each sub-block «■ 
Kecks car start. Therefore, it takes aom. amount . o. I £«r un«» P^f m , processlng order. R8k . 

cessed twice. Unfortunately, the addresses are J*™££J ^ blocks . Su ch implementations are 

Sfn a angle port main memory combined with a co mbina ton Dt J^J* scratch memory would include 

nemoered L the complexity involved in meeting fte «^ me mory blocks would have 176 ad- 

E Crate memo^b locks, - -jch « — ^ch ^ ^ ^ ^ ^ 

5= - =SKSS=- — - p„d unti, each sub-block has 
Por-— a, tu.o decoder using the dua, = 

data to be decoded 900 come from the digital s.gna P?^^.*^ or main memory 902 and addresses 906 
a duaT-port RAM. Memory control block 901 9^^^^ ^ beta metrics block 905 from two separate 
m> hata ram 907 Data is passed to the alpha metrics block 904 ano i me db metrjcs b(ock gQA 

> ^^JSJJS«. Beta metres block 905 ^ *W is used in the order 

passes Z output directly to the extrinsic block 9 °f ^^"^l^S^ze must be used. The multiplexer 908 provides 

"" cl a .,.i„„t naramaters 910 W„i. , n ^n.riros a dual DOrt 



45 



Dieses computation of metric output parameters 910 W nj . 9 requires a dual port 

SST] To'avoid ioss of processor cycles. memory, it aiso requires an eight-block 

Sain memory 902 having an array ™«£S^^Z£x is us'ed in comparison to the order ,n wh,ch the 
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SUMMAnT ur four-sliding windows preferred 

[0022] This inve ntion is a concurrent memory contro, memory This is in 

Siment requires only a single port ^^^J^^SLtln memory and an ^ bJj^W 
^ntrast to conventional turbo decoders which would employ a ~~ , happen tor the scratch memories. II a 

^ b^rmory. During each "* *" ~ "* ^ * 

particular location in memory, has been read, then tnai 



store its data. 
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[0023] During processing of the first beta sub-block the data memories for the systematic, parities, and a-priori are 
read. The reliability portion of this data is written into the scratch memory in a reverse order. After the beta sub-block 
processing has finished, the alpha reliability data is loaded into the scratch RAM, but not the alpha prolog data. The 
turbo decoder controller starts a new state in which the alpha prolog data is read from the data memories and the data 
is stored in the scratch RAM. The maximum size of each of the scratch memories is equal to the maximum sum of the 
reliability and prolog sizes. A solution for the addressing requirements for interleaved forward and reverse addressing 
order are also described. 

BRIEF DESCRIPTION OF THE DRAWINGS 

[0024] These and other aspects of this invention are illustrated in the drawings, in which: 
Figure 1 illustrates the high level functional block diagram of a prior art turbo decoder; 
Figure 2 illustrates a lower level functional block diagram of a prior art turbo decoder; 
Figure 3 Illustrates a functional block diagram of a prior art MAP decoder; 

Figure 4 illustrates breaking a block of size n into sliding window blocks of size r according to the prior art; 

Figure 5 illustrates the make-up of a prior art beta sliding block; 

Figure 6 illustrates the prior art processing of beta and alpha sub-blocks versus time; 

Figure 7 illustrates the prior art processing of four beta sliding windows in parallel; 

Figure 8 illustrates the prior art processing of four alpha sliding windows in parallel; 

Figure 9 illustrates the prior art use of ping-pong scratch memory in a four-s I i ding-windows conventional turbo 
decoder; 

Figure 1 0 illustrates the physical address order of scratch RAM in a first embodiment of this invention; 

Figure 1 1 illustrates the processing of four beta sliding windows in parallel for a second embodiment of this inven- 
tion: 

Figure 12 illustrates the physical address order of scratch RAM for the second embodiment of this invention; 
Figure 13 illustrates the processing of beta and alpha sliding windows versus time; and 

Figure 14 illustrates the concurrent memory control of this invention with interfacing for main memory, scratch 
memories and beta memory in a four-sli ding-windows turbo decoder. 

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS 

[0025] With four sliding windows assumed, Figure 10 illustrates the physical address order of scratch RAM for the 
preferred embodiment. Beta processing state takes a maximum of 

((128+48)x4)+12 = 716 cycles [9] 
and the alpha prolog processing state takes 

(48x4)+8 = 200 cycles. [1 o] 

[0026] The 12 term in equation 9 and the 8 term in equation 10 arise from the extra cycles needed to setup the 
respective states. 
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[0029] The disadvantage of the above address.ng scheme is that it takes <^<« - ™ MAP OKmr 

during the 200 cyo.es it takes to store the a.pha prolog data This ."^^J^jgT^ liability data 

is for sliding window 0 (HBO) 11 01 Storing mis arow _ f the a , pha pr0 | og state . This technique 

MriM (PAO) "02 k "" P ,3TS function prop.rty who Mi so,*,. W.iltng to. p,olo 9 

and 1204, as shown in Figure 12. 0fW ~H The readina and writing of data to the 

[0033] A second address pointer for prolog address regior M2M . adde* The rcad r^ 9 

prolog address region 1201 is simpler than ^<*^£££ JJ£ * Bottth and beta prolog data 
data with respect to time never overlap with eac ^ " ^"^J^ data is not stored in the alpha scratch 
30 are read and processed at the beginning of the state 1 300. js free ,„ ^ scratch me mory. 

RAMs. Once the alpha prolog section has finished ^ut.ngjhen the I^^SSwS^ once in the current 
When the beta starts the beta reliability section during ^«^?^J^^Z^ region 1201 . The 
sliding window reliability address region 1200 and then m the ^^^^^ Mt ^ in Figure 11 . 
first part of the beta reliability data is the alpha prolog *^"^^££^%^ m w^^ 

the physical Implementation of the ™">»V«» f , u , 8S over M „ umM , of cycles 

40 [0035) This new fecomqu. roqotoo 71 6*56 - 772 cycl »• ™£ ^ ataUcoot wOen turning too number of 

45 13.3% cycle improvement. 



Table 2 



50 



State 



Determine sliding 
windows pointers 



55 



beta.alpha.extrinsic 
processing 

load alpha prolog dataf or 
sliding window '0' 



Equation for Second 
Embodiment 



(10+1)x4 




(10+1)((128+48)X4+12) 




(9)((48x1)+8)-i-(2X3) 



Number of Cycles for First 
Embodiment 



7876 



Number of Cycles for 
Second Embodiment 



44 



7876 



510 
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Table 2 (continued) 



State 


Equation for Second 
Embodiment 


Number of Cycles for First 
Embodiment 


Number of Cycles for 
Second Embodiment 


4 Sliding Windows 


(9)((48x4)+8)+(2x3) 


1806 




wait for extrinsics 


(10x1)+(ix2) 


12 


12 


start new sub-block 


11x1 


11 


11 


wait for stopping criteria 


10 


10 


10 


Total per MAP decode 


m 


9,759 


8,463 


Total per Iteration 


i= 2xm 


19,518 


16,926 


Total per 10 iterations 


10x/ 


195 s 180 


169,260 



[0037] Figure 14 illustrates a block diagram of a MAP decoder architecture using the concurrent memory control of 
a preferred embodiment of this invention. This preferred embodiment is a four-sliding-windows architecture which 
requires four scratch memories and four beta memories. Four sliding window data is efficiently processed in a four- 
cycle beta metrics block architecture. Figure 14 is an expanded view of Figure 3, showing blocks of data to be decoded 
1400 coming from the digital signal processor (DSP) to main memory 1402. Concurrent memory controller 1401 pro- 
vides addresses 411 for main memory 1402, addresses 1412 for scratch memory 1403 and addresses 1406 for beta 
RAM 1407. Alpha metrics block 1404 and beta metrics block 1405 both interface with the scratch memory 1403. Beta 
metrics block 1405 writes to scratch memory 1403 and alpha metrics block 1404 reads from scratch memory 1403. 
Concurrent memory interface controller 1401 controls all memory operations in main memory 1402, scratch memory 
1403 and controls beta RAM 1407. Scratch memory 1403 employs 45 bit scratch memory words consisting of system- 
atic bits (8), parity bits (8X2=16), a-priori bits (8) and interleaver data (13). The interleaver data is the extrinsic data 
address used when storing the extrinsic information. Concurrent memory controller 1401 drives the flow of data ac- 
cording to the prescription of Figures 10 through 12. It performs control and address generation for all three memory 
blocks. Multiplexer 1408 provides interface between the four separate portions of beta memory 1407 and the extrinsic 
block 1 409. Extrinsic block 1409 completes computation of the metric output parameters 141 0 W n j. 
[0038] Turbo coders are becoming widely used in many fields of communications. Turbo decoders are iterative de- 
coders which execute the MAP decoder twice per iteration. The typical number of iterations ranges from 6 to 12. It is 
important to reduce the cycle count per decode which improves the system performance. A novel approach to limiting 
the number of memories required and a method of controlling the memories efficiently is described here. Most of the 
alpha prolog data and all of the alpha reliability data is folded into the cycles required to generate the beta state metrics. 
This reduces the cycle count of the decode. 



Claims 

1 . A method of turbo decoding comprising the steps of: 

computing beta state metrics for a sliding window of data, 

storing said beta state metrics for the sliding window of data in a scratch memory in a first addressing order; 
then repetitively for each sliding window 

toggling a current addressing order for a current repetition between the first addressing order and a second 
addressing order said second addressing order being opposite to said first addressing order, 
computing alpha state metrics for the sliding window of data, 

reading beta state metrics from the scratch memory in the addressing order for the current repetition, 
combining the computed alpha state metrics and the recalled beta state metrics in an extrinsic block thereby 
producing extrinsic outputs, 

computing beta state metrics for a next sliding window of data, 

storing said beta state metrics forthe next sliding window of data in a scratch memory in the addressing order 
of the current repetition; 

until a frame of data including a plurality of sliding windows is decoded. 

2. The method of turbo decoding of claim 1 , wherein: 
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order opposite to said third addressing order. 
^ method of turbo decoding comprising the steps of: 

orovidinq a plurality of scratch memories, each of said scratch memories having a reliability portion and a 
pZg portio'n the plurality of scratch memories employed in a predetermmed c.rcular order. 

a second addressing order; 
addrasX order said tnW addling order baing nppnsira to said « addrass.ng drd.r. 

"■""'combining Ih. oompu.ad alpha « mamas and , da ran.llad bar. a.ara marries in an axrrinsia blnoK rharaby 

producing extrinsic outputs, 

computing beta state metrics for a next sliding window ' * °* *■ da{a ^ reliabiljty portion of 

a nJ^SSS^nS^r^^ current adding order and in the 

prolog poln o'f a 9 secon d succeeding scratch memory in the second address.ng order; 
until a frame of data including a plurality of sliding windows is decoded. 

The method of turbo decoding of claim 1 , wherein: 

ma srap a, racing an addressing ,-dar .or a SSSSSJISX 

fifth addressing order. 
A turbo decode apparatus comprising: 

I S ^~ for — 9 beta metrics from data reca,,ed ,rom sa,d 

o7a polity of scratch memories connected to said beta metres b.ocK for storing said beta metrics, each 

r a ;p=^^^ 

^tlme^ - said beta metrics b.ock storing beta metrics formed by said beta metrics 

block, said beta metrics memory heving a plurality °f sections; jd beta metric6 memory 

a multiplexer having a plurality of inputs each ^^^^^^^JZi beta metrics memory; 
and an output outputting data -called from ; J^^^^X^ extri nsic outputs: and 
an extrinsic block connected to sa.dalphametr.es block and* ** a ™ p jd set of a plurality of scratch 

^^^^^^r^^'S^ ^ * • SSdddd 
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alternating addressing order, said alpha metrics reads from a second following scratch memory in a third 
addressing order opposite to said first addressing order, and said alpha metrics reads from a third following 
scratch memory in a fourth alternating order opposite to said second alternating order. 
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